NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Pettet, Ava; Zhang, Yunuo; Luo, Baiting; Wray, Kyle; Baier, Hendrik; Laszka, Aron; Dubey, Abhishek; Mukhopadhyay, Ayan (May 2024, International Conference on Autonomous Agents and Multiagent Systems)

Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.
more » « less
Full Text Available
Competence-Aware Path Planning Via Introspective Perception

https://doi.org/10.1109/LRA.2022.3145517

Rabiee, Sadegh; Basich, Connor; Wray, Kyle Hollins; Zilberstein, Shlomo; Biswas, Joydeep (April 2022, IEEE Robotics and Automation Letters)

Full Text Available
Belief Space Metareasoning for Exception Recovery

Svegliato, Justin; Wray, Kyle; Witwicki, Stefan; Biswas, Joydeep; Zilberstein, Shlomo (October 2019, IROS)

This paper has been submitted and is under review. Please do not cite or distribute.
more » « less
Full Text Available
Meta-Level Control of Anytime Algorithms with Online Performance Prediction

https://doi.org/10.24963/ijcai.2018/208

Svegliato, Justin; Wray, Kyle Hollins; Zilberstein, Shlomo (July 2018, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence)

Anytime algorithms enable intelligent systems to trade computation time with solution quality. To exploit this crucial ability in real-time decision-making, the system must decide when to interrupt the anytime algorithm and act on the current solution. Existing meta-level control techniques, however, address this problem by relying on significant offline work that diminishes their practical utility and accuracy. We formally introduce an online performance prediction framework that enables meta-level control to adapt to each instance of a problem without any preprocessing. Using this framework, we then present a meta-level control technique and two stopping conditions. Finally, we show that our approach outperforms existing techniques that require substantial offline work. The result is efficient nonmyopic meta-level control that reduces the overhead and increases the benefits of using anytime algorithms in intelligent systems.
more » « less
Full Text Available
Belief-Space Planning for Automated Malware Defense

Svegliato, Justin; Wray, Kyle; Witwicki, Stefan; Biswas, Joydeep; Zilberstein, Shlomo (July 2018, IJCAI-ECAI Workshop on AI for Internet of Things)

Malware detection and response is critical to ensuring information security across a wide range of devices. There have been few attempts, however, to develop security systems that exploit the benefits of different malware detection techniques. We formally introduce an automated malware defense framework and represent it as a belief-space planning problem that optimally reduces the impact on the performance of a system. Using the framework, we then provide an example automated malware defense system for email worm detection and response. Finally, we show in simulation that the system outperforms standard security techniques that have been used in practice. The result is a novel belief-space planning approach to auto- mated malware defense designed for robust, accurate, and efficient use in large networks of resource-constrained devices.
more » « less
Full Text Available
Online Decision-Making for Scalable Autonomous Systems

https://doi.org/10.24963/ijcai.2017/664

Wray, Kyle Hollins; Witwicki, Stefan J.; Zilberstein, Shlomo (August 2017, International Joint Conference on Artificial Intelligence)

We present a general formal model called MODIA that can tackle a central challenge for autonomous vehicles (AVs), namely the ability to interact with an unspecified, large number of world entities. In MODIA, a collection of possible decision-problems (DPs), known a priori, are instantiated online and executed as decision-components (DCs), unknown a priori. To combine their individual action recommendations of the DCs into a single action, we propose the lexicographic executor action function (LEAF) mechanism. We analyze the complexity of MODIA and establish LEAF’s relation to regret minimization. Finally, we implement MODIA and LEAF using collections of partially observable Markov decision process (POMDP) DPs, and use them for complex AV intersection decision-making. We evaluate the approach in six scenarios within an industry-standard vehicle simulator, and present its use on an AV prototype.
more » « less
Full Text Available

Search for: All records